-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SoudenMVDR module #2367
Add SoudenMVDR module #2367
Conversation
@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
torchaudio/transforms/_transforms.py
Outdated
specgram (Tensor): Multi-channel complex-valued spectrum. | ||
Tensor of dimension `(..., channel, freq, time)` | ||
psd_s (Tensor): The complex-valued power spectral density (PSD) matrix of target speech. | ||
Tensor of dimension `(..., freq, channel, channel)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a check somewhere that enforces equality between the last two dimensions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Adding such check helps users better understand the module usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is applicable to F.rtf_power
, F.rtf_evd
, F.mvdr_weights_rtf
, F.mvdr_weights_souden
. It's better to add it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure sounds good
@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg — just some small things
torchaudio/transforms/_transforms.py
Outdated
Given the multi-channel complex-valued spectrum :math:`\textbf{Y}`, the power spectral density (PSD) matrix | ||
of target speech :math:`\bf{\Phi}_{\textbf{SS}}`, the PSD matrix of noise :math:`\bf{\Phi}_{\textbf{NN}}`, and | ||
a one-hot vector that represents the reference channel :math:`\bf{u}`, the module computes the single-channel | ||
complex-valued spectrum of the enhaned speech :math:`\hat{\textbf{S}}`. The formula is defined as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complex-valued spectrum of the enhaned speech :math:`\hat{\textbf{S}}`. The formula is defined as: | |
complex-valued spectrum of the enhanced speech :math:`\hat{\textbf{S}}`. The formula is defined as: |
.. math:: | ||
\textbf{w}_{\text{MVDR}}(f) = | ||
\frac{{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f){\bf{\Phi}_{\textbf{SS}}}}(f)} | ||
{\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is "tr" more standard?
{\text{Trace}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u} | |
{\text{tr}({{{\bf{\Phi}_{\textbf{NN}}^{-1}}(f) \bf{\Phi}_{\textbf{SS}}}(f))}}\bm{u} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw different usages in different publications, trace
in https://www.merl.com/publications/docs/TR2016-072.pdf, Trace
in https://arxiv.org/pdf/2005.10479.pdf, and Tr
in https://ieeexplore.ieee.org/abstract/document/7952756.
For a better understanding we can put Trace
here.
@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Add a new design of MVDR module.
The
SoudenMVDR
module supports the method proposed by Souden et, al..The input arguments are:
The output of the module is the single-channel complex-valued spectrum for the enhanced speech.